Skip to content

Fixes two issues involving RCV2 updates#69

Merged
jscalev merged 7 commits into
mainfrom
dev/jcalev/RunCommandUpgrades
Jan 20, 2026
Merged

Fixes two issues involving RCV2 updates#69
jscalev merged 7 commits into
mainfrom
dev/jcalev/RunCommandUpgrades

Conversation

@jscalev

@jscalev jscalev commented Dec 8, 2025

Copy link
Copy Markdown
Contributor

There were two separate issues that resulted in the re-execution of RunCommand scripts during both the upgrade from 1.3.18 to 1.3.26, and to the downgrade from 1.3.26 to 1.3.18.

For background, RC prevents reruns through a file named mrseq that contains the last sequence number executed. In a previous version, 1.3.17 - which was never placed in production - a regression was checked in where that file was deleted during the extension upgrade.

Normally during extension upgrade, these mrseq files must be migrated from one extension version to the next. However, the deletion occurred during the disable call, which executed before the upgrade. Therefore, the mrseq files were not migrated.

This issue was detected before 1.3.17 was released to production, but unfortunately the code that was actually released as 1.3.18 was the same as 1.3.17.

Besides no longer removing the mrseq files, another fix should have been added to 1.3.18 that "rehydrated" the mrseq files. Because we could not prevent legacy versions from deleting this file, the extension used the .status files - which were still extant - to replace the files.

As mentioned, the bits for 1.3.18 were the same as the faulty 1.3.17. Version 1.3.26 did contain the fixes meant for 1.3.18, but it also had code that restricted this rehydration only to versions it knew were faulty. Since it believed that 1.3.18 was correct, it did not apply the rehydration to this version.

Therefore, when the extension was upgraded from 1.3.18 to 1.3.26, the mrseq files were deleted and never rehydrated. Therefore, the extension believed the commands had never run, and re-ran them.

When downgrading from 1.3.26 to 1.3.18, the issue was different. 1.3.26 correctly did not delete the mrseq files, but the issue occurred later.

The RunCommand extension depends on Guest Agent to tell it the following in the upgrade method.
-The extension version to which we're upgrading
-The extension version from which we're upgrading

Unfortunately, a bug exists in the Linux agent that always provides the higher extension version in the "to" version. Another bug existed in that the extension was reading the wrong environment variable (there's a separate variable indicating which version the agent is calling, which for upgrades was always the higher), but ultimately the main issue is the extension was told that it was upgrading from 1.3.18 to 1.3.26, when the truth was the opposite.

As mentioned, version 1.3.26 correctly did not delete the mrseq files. However, they had already been deleted in the upgrade mentioned previously. That was because it looked for them under 1.3.18. Since these mrseq files had already been deleted, they were not migrated.

The fixes are thus the following:

  • Widen the versions covered for rehydration to include 1.3.18
  • Since a fix to the Guest Agent is too costly and will take a long time to roll out, change the migration logic to look for mrseq files in both version directories to determine the to/from extension versions for upgrade.

Comment thread internal/cmds/cmds.go
Comment thread internal/cmds/cmds.go Outdated
Comment thread internal/cmds/cmds.go Outdated
Comment thread internal/cleanup/cleanup.go
@viveklingaiah viveklingaiah self-requested a review January 12, 2026 16:43
Comment thread internal/cmds/cmds.go Outdated
Comment thread internal/cmds/cmds.go
Comment thread internal/cmds/cmds.go
Comment thread internal/pid/pid.go Outdated
@jscalev jscalev closed this Jan 13, 2026
@jscalev jscalev reopened this Jan 13, 2026

@D1v38om83r D1v38om83r left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a unit test for rehydrateMrSeqFilesForProblematicUpgrades if one doesn't exist already?

@jscalev

jscalev commented Jan 16, 2026

Copy link
Copy Markdown
Contributor Author

Can we also have a unit test for rehydrateMrSeqFilesForProblematicUpgrades if one doesn't exist already?

This was being hit in an e2e test, but I agree that we should have more unit tests that directly hit it, so I added them.

@jscalev jscalev closed this Jan 16, 2026
@jscalev jscalev reopened this Jan 16, 2026
Comment thread internal/cmds/cmds.go Outdated
@jscalev jscalev merged commit c202610 into main Jan 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants